Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

نویسندگان

چکیده

Advances in NLP have yielded impressive results for the task of machine reading comprehension (MRC), with approaches having been reported to achieve performance comparable that humans. In this paper, we investigate whether state-of-the-art MRC models are able correctly process Semantics Altering Modifications (SAM): linguistically-motivated phenomena alter semantics a sentence while preserving most its lexical surface form. We present method automatically generate and align challenge sets featuring original altered examples. further propose novel evaluation methodology assess capability systems these examples independent data they were optimised on, by discounting effects introduced domain shift. large-scale empirical study, apply order evaluate extractive regard their SAM-enriched data. comprehensively cover 12 different neural architecture configurations four training datasets find -- despite well-known remarkable consistently struggle semantically

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Machine Reading Systems through Comprehension Tests

This paper describes a methodology for testing and evaluating the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. The methodology is being used in QA4MRE (QA for Machine Reading Evaluation), one of the labs of CLEF. We report here the conclusions and lessons learned after the first campaign in 2011.

متن کامل

Adversarial Examples for Evaluating Reading Comprehension Systems

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about ...

متن کامل

Stochastic Answer Networks for Machine Reading Comprehension

We propose a simple yet robust stochastic answer network (SAN) that simulates multistep reasoning in machine reading comprehension. Compared to previous work such as ReasoNet, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We show that this simple trick improves robustness and achieves result...

متن کامل

Machine Comprehension with Syntax, Frames, and Semantics

We demonstrate significant improvement on the MCTest question answering task (Richardson et al., 2013) by augmenting baseline features with features based on syntax, frame semantics, coreference, and word embeddings, and combining them in a max-margin learning framework. We achieve the best results we are aware of on this dataset, outperforming concurrentlypublished results. These results demon...

متن کامل

Evaluating the Meaning of Answers to Reading Comprehension Questions: A Semantics-Based Approach

There is a rise in interest in the evaluation of meaning in real-life applications, e.g., for assessing the content of short answers. The approaches typically use a combination of shallow and deep representations, but little use is made of the semantic formalisms created by theoretical linguists to represent meaning. In this paper, we explore the use of the underspecified semantic formalism LRS...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i15.17622